Compiled under R version 4.3.0 (2023-04-21)

WARNING: edit the working directory to your preferred folder.

This document details all analyses performed in R for the study:

Legendre, L.J., C.A. Rodríguez-Saltos, C.M. Eliason, & J.A. Clarke. 2023. Evolution of the syrinx of Apodiformes including the vocal-learning Trochilidae (Aves, Strisores). Zoological Journal of the Linnean Society.

For more information regarding the study, datasets, and analyses, please refer to the Main Text and Supplementary Information of this paper. If you have any additional questions, feel free to email me at .

Loading packages and functions

library(tidyverse)
library(nlme)
library(car)
library(ape)
library(phytools)
library(bioacoustics)
library(AICcmodavg)
library(evobiR)
library(phylopath)
library(caper)
library(geiger)
library(windex)
library(geomorph)
library(RColorBrewer)
  • Import function fitEvolPar to estimate the best fit of the main parameter (alpha, lambda, or g) in the corresponding evolutionary model (OU, lambda, or EB, respectively)

(Download from GitHub: https://github.com/LucasLegendre/fitEvolPar)

sys.source("fitEvolPar.R", envir = knitr::knit_global())

Acoustic data extraction

  • Data
# Load both datasets
acoudata<-read.csv("notes-filtered_20211202.csv", header=TRUE, sep=";")
datasyr<-read.table("Dataset_syrinx_strisores.txt", header=TRUE)
rownames(datasyr)<-datasyr$Taxon

# Extract species in acoustic dataset for which we have morphological data
acoudataset<-acoudata %>% 
  group_by(species) %>% 
  dplyr::summarize(duration=mean(duration), freq_max_amp=mean(freq_max_amp), freq_min=mean(freq_min),
            freq_max=mean(freq_max), bandwidth=mean(bandwidth))

acoudataset<-as.data.frame(acoudataset); rownames(acoudataset)<-acoudataset$species

vector<-c(datasyr$Taxon[1], "Chaetura_pelagica",
          datasyr$Taxon[c(4:12)],"Oreotrochilus_chimborazo", datasyr$Taxon[c(15:21)])
intersect(acoudataset$species, vector) # Species present in both datasets
##  [1] "Apus_affinis"                "Archilochus_colubris"       
##  [3] "Calliphlox_amethystina"      "Chaetura_pelagica"          
##  [5] "Chordeiles_minor"            "Coeligena_violifer"         
##  [7] "Collocalia_esculenta"        "Eutoxeres_condamini"        
##  [9] "Florisuga_mellivora"         "Heliangelus_amethysticollis"
## [11] "Mellisuga_minima"            "Oreotrochilus_chimborazo"   
## [13] "Patagona_gigas"              "Phaethornis_guy"            
## [15] "Phaethornis_hispidus"        "Phaethornis_superciliosus"  
## [17] "Topaza_pella"
acoumorph<-acoudataset %>% filter(species %in% vector)

# Reformat to match syrinx data
acoumorph<-acoumorph[match(vector,acoumorph$species),] # reorder to match syrinx data
rownames(acoumorph)[c(17,18)]<-datasyr$Taxon[c(19,20)]
acoumorph$species[c(17,18)]<-rownames(acoumorph)[c(17,18)]
acoumorph<-acoumorph[c(1,rep(2,2),3:11,rep(12,2),13:19),]

# Bind two datasets
data<-cbind(datasyr,acoumorph[,c(2:6)])
data<-data[,c(1:11,22:27)]; colnames(data)[12]<-"IM_CSA"
rownames(data)<-data$Taxon

Here, we extract five acoustic traits to add them to the morphological dataset – duration, frequency at maximum amplitude, minimum frequency, maximum frequency, and bandwidth – for more info on each parameter, see vignette for package bioacoustics:

vignette("introduction", package = "bioacoustics")
## starting httpd help server ... done
  • Tree
treeS<-read.nexus("Tree_strisores.trees.nex")
treesyr<-drop.tip(treeS, setdiff(treeS$tip.label, rownames(datasyr)))
plotTree(treesyr, fsize=1.2)

data<-ReorderData(treesyr,data)

Reduce dimensionality of acoustic data using phylogenetic Principal Component Analysis (pPCA)

  • Checking for phylogenetic signal (pPCA assumes a Brownian Motion model for all traits - see Revell, 2009; Uyeda et al., 2015)
# Remove missing data
acouphyl<-data[!is.na(data$duration),c(13:17)]
acoutree<-drop.tip(treesyr, setdiff(treesyr$tip.label,rownames(acouphyl)))

# Phylogenetic signal
var=list(); phy=list()
for (i in 1:5) {
  var<-acouphyl[,i]; names(var)<-rownames(acouphyl)
  phy[[i]]<-phylosig(acoutree, var, method="lambda", test=TRUE)
}
phy
## [[1]]
## 
## Phylogenetic signal lambda : 1.00155 
## logL(lambda) : -88.0979 
## LR(lambda=0) : 17.1352 
## P-value (based on LR test) : 3.48114e-05 
## 
## 
## [[2]]
## 
## Phylogenetic signal lambda : 1.00155 
## logL(lambda) : -162.162 
## LR(lambda=0) : 8.83758 
## P-value (based on LR test) : 0.0029509 
## 
## 
## [[3]]
## 
## Phylogenetic signal lambda : 7.3433e-05 
## logL(lambda) : -161.255 
## LR(lambda=0) : -0.000901409 
## P-value (based on LR test) : 1 
## 
## 
## [[4]]
## 
## Phylogenetic signal lambda : 1.00155 
## logL(lambda) : -166.131 
## LR(lambda=0) : 13.4785 
## P-value (based on LR test) : 0.000241313 
## 
## 
## [[5]]
## 
## Phylogenetic signal lambda : 1.00155 
## logL(lambda) : -158.934 
## LR(lambda=0) : 12.9758 
## P-value (based on LR test) : 0.00031554

The signal is strong for three variables (duration, max. frequency, and bandwidth), but not significant for the other two (frequency at max. amplitude and min. frequency). We will use a lambda model (Brownian Motion with Pagel’s lambda optimized using restricted maximum likelihood).

  • pPCA (using phytools)
pPCA<-phyl.pca(acoutree, acouphyl, method="lambda", opt="REML")
summary(pPCA)
## Importance of components:
##                                PC1         PC2        PC3         PC4
## Standard deviation     415.0202569 151.6623811 48.3139582 4.827427729
## Proportion of Variance   0.8716654   0.1164037  0.0118129 0.000117935
## Cumulative Proportion    0.8716654   0.9880692  0.9998821 1.000000000
##                                 PC5
## Standard deviation     1.792324e-05
## Proportion of Variance 1.625714e-15
## Cumulative Proportion  1.000000e+00
plot(pPCA); biplot(pPCA)

pPCA$lambda # very low value, as expected
## [1] 0.1837253

PC1 explains 88% of the variance, PC2 only 11% – we can use PC1 as a character representing acoustic parameters for the species in our dataset.

  • Let us add PC1 to our dataset
PC1<-pPCA$S[,1]
PC1<-c(PC1[1:12],NA,PC1[13:17],NA,PC1[18:19])
names(PC1)[c(13,19)]<-rownames(acoumorph)[c(20,19)]
data<-cbind(data, PC1)
  • Let us also check phylogenetic signal for morphoanatomical traits
var=list(); phy=list(); phydat=data.frame()
for (i in 4:12) {
  var<-data[,i]; names(var)<-rownames(data)
  var2<-var[!is.na(var)]
  treesyrvar<-drop.tip(treesyr,setdiff(treesyr$tip.label,names(var2)))
  phy[[i]]<-phylosig(treesyrvar, var2, method="lambda", test=TRUE)
  phydat[i,1]<-colnames(data)[i]; phydat[i,2]<-phy[[i]]$lambda; phydat[i,3]<-phy[[i]]$P
}
colnames(phydat)<-c("Variable", "Lambda", "P")
phydat[c(4:12),]
### Checking evolutionary model with fitContinuous
models=c("BM", "OU", "EB", "rate_trend","lambda", "white") # (For more info, check '?fitContinuous')

var=list(); fit=list(); mod=list()
for (i in 4:12) {
  var<-log(data[,i]); names(var)<-rownames(data)
  var2<-var[!is.na(var)]
  treesyrvar<-drop.tip(treesyr,setdiff(treesyr$tip.label,names(var2)))
  for (m in 1:length(models)) {
    fit[[m]]=fitContinuous(treesyrvar, var2, model=models[m], ncores=2)
  }
  mod[[i]]<-modSel.geiger(fit[[1]],fit[[2]],fit[[3]],fit[[4]],fit[[5]],fit[[6]])
}
mod
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
##          K    logLik      AICc deltaAICc Weight Evidence ratio
## fit[[4]] 3  49.02015 -90.62853    0.0000      1   1.000000e+00
## fit[[1]] 2 -15.00120  34.66906  125.2976      0   1.614457e+27
## fit[[3]] 3 -14.94029  37.29234  127.9209      0   5.993284e+27
## fit[[2]] 3 -15.00120  37.41416  128.0427      0   6.369670e+27
## fit[[5]] 3 -15.00120  37.41416  128.0427      0   6.369670e+27
## fit[[6]] 2 -27.37592  59.41850  150.0470      0   3.822077e+32
## 
## [[5]]
##          K     logLik     AICc deltaAICc Weight Evidence ratio
## fit[[6]] 2  -4.082598 12.87108  0.000000 0.5580   1.000000e+00
## fit[[5]] 3  -3.539262 14.57852  1.707445 0.2376   2.348373e+00
## fit[[2]] 3  -3.689804 14.87961  2.008529 0.2044   2.729899e+00
## fit[[4]] 3 -13.985357 35.47071 22.599637 0.0000   8.080696e+04
## fit[[1]] 2 -15.925669 36.55722 23.686142 0.0000   1.391170e+05
## fit[[3]] 3 -15.925762 39.35152 26.480445 0.0000   5.625428e+05
## 
## [[6]]
##          K    logLik     AICc deltaAICc Weight Evidence ratio
## fit[[2]] 3 -16.94350 41.29877  0.000000 0.7127       1.000000
## fit[[5]] 3 -18.63204 44.67584  3.377069 0.1317       5.411545
## fit[[6]] 2 -20.71333 46.09333  4.794562 0.0648      10.993246
## fit[[4]] 3 -19.78713 46.98603  5.687262 0.0415      17.178030
## fit[[1]] 2 -21.21266 47.09199  5.793215 0.0393      18.112595
## fit[[3]] 3 -21.21272 49.83721  8.538440 0.0100      71.465871
## 
## [[7]]
##          K    logLik     AICc deltaAICc Weight Evidence ratio
## fit[[5]] 3 -12.65971 32.73118  0.000000 0.7849       1.000000
## fit[[2]] 3 -14.63783 36.68742  3.956238 0.1086       7.229132
## fit[[1]] 2 -16.84765 38.36196  5.630776 0.0470      16.699653
## fit[[4]] 3 -15.72212 38.85600  6.124818 0.0367      21.378999
## fit[[3]] 3 -16.84770 41.10717  8.375988 0.0119      65.890493
## fit[[6]] 2 -18.31153 41.28973  8.558546 0.0109      72.187955
## 
## [[8]]
##          K     logLik     AICc deltaAICc Weight Evidence ratio
## fit[[1]] 2  -9.212933 23.28301  0.000000 0.3446       1.000000
## fit[[2]] 3  -8.366069 24.57829  1.295283 0.1803       1.911028
## fit[[4]] 3  -8.572344 24.99084  1.707832 0.1467       2.348827
## fit[[5]] 3  -8.650226 25.14661  1.863597 0.1357       2.539071
## fit[[6]] 2 -10.306662 25.47047  2.187458 0.1154       2.985386
## fit[[3]] 3  -9.212967 26.27209  2.989079 0.0773       4.457284
## 
## [[9]]
##          K     logLik     AICc deltaAICc Weight Evidence ratio
## fit[[1]] 2  -8.041426 20.94000  0.000000 0.5265       1.000000
## fit[[4]] 3  -8.040869 23.92789  2.987897 0.1182       4.454649
## fit[[3]] 3  -8.040938 23.92803  2.988034 0.1182       4.454956
## fit[[2]] 3  -8.041426 23.92901  2.989011 0.1181       4.457132
## fit[[5]] 3  -8.041426 23.92901  2.989011 0.1181       4.457132
## fit[[6]] 2 -14.441956 33.74105 12.801059 0.0009     602.163690
## 
## [[10]]
##          K    logLik     AICc deltaAICc Weight Evidence ratio
## fit[[1]] 2  1.750625 1.204632 0.0000000 0.3603       1.000000
## fit[[2]] 3  2.826509 1.846981 0.6423492 0.2613       1.378746
## fit[[4]] 3  2.509289 2.481421 1.2767893 0.1903       1.893439
## fit[[5]] 3  1.750625 3.998750 2.7941176 0.0891       4.043290
## fit[[3]] 3  1.750591 3.998817 2.7941854 0.0891       4.043428
## fit[[6]] 2 -1.845972 8.397826 7.1931945 0.0099      36.473912
## 
## [[11]]
##          K     logLik      AICc deltaAICc Weight Evidence ratio
## fit[[2]] 3  1.2034974  5.093005  0.000000 0.4416       1.000000
## fit[[1]] 2 -0.7733693  6.252621  1.159616 0.2473       1.785695
## fit[[4]] 3  0.2556109  6.988778  1.895773 0.1711       2.580250
## fit[[5]] 3 -0.7016153  8.903231  3.810225 0.0657       6.720164
## fit[[3]] 3 -0.7734162  9.046832  3.953827 0.0612       7.220423
## fit[[6]] 2 -3.7121227 12.130128  7.037122 0.0131      33.735856
## 
## [[12]]
##          K    logLik     AICc deltaAICc Weight Evidence ratio
## fit[[6]] 2 -14.60581 34.21162  0.000000 0.7097   1.000000e+00
## fit[[5]] 3 -14.44931 37.08043  2.868815 0.1691   4.197157e+00
## fit[[2]] 3 -14.78187 37.74556  3.533948 0.1212   5.853116e+00
## fit[[4]] 3 -24.22837 56.63856 22.426944 0.0000   7.412232e+04
## fit[[1]] 2 -27.30210 59.60419 25.392577 0.0000   3.265336e+05
## fit[[3]] 3 -27.30215 62.78611 28.574495 0.0000   1.602776e+06

High and significant values of lambda for all traits except distance TL-labia and IM CSA.

Phylogenetic path analysis (PPA)

Using phylopath.

  • Subset the data to relevant characters
# Trachea length as % trachea length/total vocal tract length
TracheaRatio<-(data[,8]/(data[,8]+data[,9]))*100; names(TracheaRatio)<-rownames(data)

PPAdata<-cbind(data[,c(1,4:7)], TracheaRatio, data[,c(10,11,18)])
  • Preliminary tests of evolutionary model to pick the best model for PPA
Modnames = paste(c("BM", "OU", "Lambda", "EB", "OLS"), sep = " ")
# For more information, see '?corClasses' in ape

# With body mass as predictor
bodymass<-PPAdata$Body_mass; names(bodymass)<-paste(rownames(PPAdata))
for (i in (1:7)) {
    var2<-log10(PPAdata[,i+2])
    names(var2)<-rownames(PPAdata)
    var<-subset(var2, !is.na(var2))
    treetest<-drop.tip(treesyr, setdiff(treesyr$tip.label, names(var)))
    bodymass2<-log10(subset(bodymass, !is.na(var2)))
    datatest<-as.data.frame(cbind(bodymass2,var))
    spp<-names(var)
    model=list()
    model[[1]]=gls(var~bodymass2, datatest,
                  correlation=corBrownian(phy=treetest, form=~spp), method="ML")
    model[[2]]=gls(var~bodymass2, datatest,
                  correlation=corMartins(fitEvolPar(datatest, treetest,"OU"),
                                         phy=treetest, fixed=TRUE,
                                         form=~spp), method="ML")
    model[[3]]=gls(var~bodymass2, datatest,
                  correlation=corPagel(fitEvolPar(datatest,treetest,"lambda"),
                                       phy=treetest,
                                       fixed=TRUE, form=~spp), method="ML")
    model[[4]]=gls(var~bodymass2, datatest,
                  correlation=corBlomberg(fitEvolPar(datatest,treetest,"EB"),
                                          phy=treetest, fixed=TRUE,
                                          form=~spp), method="ML")
    model[[5]]=gls(var~bodymass2, datatest, method="ML")
    print(colnames(PPAdata)[i+2])
    print(aictab(cand.set=model, modnames=Modnames,sort=TRUE))
}
## [1] 0
## [1] 0.6
## [1] 0.1
## [1] "Distance_TL.labia"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt    LL
## BM     3   5.69        Inf    NaN    NaN  0.90
## Lambda 3 -20.62        Inf    NaN    NaN 14.06
## EB     3  -9.86        Inf    NaN    NaN  8.68
## OLS    3 -17.70        Inf    NaN    NaN 12.60
## OU     3   -Inf        NaN    NaN    NaN   Inf
## 
## [1] 0
## [1] 0.9
## [1] 0.1
## [1] "TL_CSA"
## 
## Model selection based on AICc:
## 
##        K  AICc Delta_AICc AICcWt Cum.Wt   LL
## BM     3  5.56        Inf    NaN    NaN 0.93
## Lambda 3 -7.52        Inf    NaN    NaN 7.46
## EB     3 -2.15        Inf    NaN    NaN 4.78
## OLS    3  1.74        Inf    NaN    NaN 2.84
## OU     3  -Inf        NaN    NaN    NaN  Inf
## 
## [1] 0
## [1] 0.9
## [1] 0.2
## [1] "Length_labia"
## 
## Model selection based on AICc:
## 
##        K  AICc Delta_AICc AICcWt Cum.Wt    LL
## BM     3  3.57        Inf    NaN    NaN  1.92
## Lambda 3 -3.54        Inf    NaN    NaN  5.47
## EB     3  1.79        Inf    NaN    NaN  2.81
## OLS    3  8.70        Inf    NaN    NaN -0.64
## OU     3  -Inf        NaN    NaN    NaN   Inf
## 
## [1] 0
## [1] 0.9
## [1] 1
## [1] "TracheaRatio"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt    LL
## BM     3 -28.09        Inf    NaN    NaN 17.97
## Lambda 3 -29.11        Inf    NaN    NaN 18.48
## EB     3 -28.09        Inf    NaN    NaN 17.97
## OLS    3 -20.54        Inf    NaN    NaN 14.19
## OU     3   -Inf        NaN    NaN    NaN   Inf
## 
## [1] 0
## [1] 0
## [1] 0.1
## [1] "Tracheal_diameter_EXT"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt    LL
## BM     3 -41.79        Inf    NaN    NaN 24.65
## Lambda 3 -54.22        Inf    NaN    NaN 30.86
## EB     3 -53.07        Inf    NaN    NaN 30.29
## OLS    3 -54.22        Inf    NaN    NaN 30.86
## OU     3   -Inf        NaN    NaN    NaN   Inf
## 
## [1] 0
## [1] 0
## [1] 0.1
## [1] "Tracheal_diameter_INT"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt    LL
## BM     3 -32.74        Inf    NaN    NaN 20.12
## Lambda 3 -48.83        Inf    NaN    NaN 28.17
## EB     3 -46.65        Inf    NaN    NaN 27.07
## OLS    3 -48.83        Inf    NaN    NaN 28.17
## OU     3   -Inf        NaN    NaN    NaN   Inf
## 
## [1] 0
## [1] 0
## [1] 0.1
## [1] "PC1"
## 
## Model selection based on AICc:
## 
##        K  AICc Delta_AICc AICcWt Cum.Wt     LL
## BM     3 35.94        Inf    NaN    NaN -12.97
## Lambda 3 24.83        Inf    NaN    NaN  -7.42
## EB     3 25.29        Inf    NaN    NaN  -7.65
## OLS    3 24.83        Inf    NaN    NaN  -7.42
## OU     3  -Inf        NaN    NaN    NaN    Inf
# With PC1 as response
for (i in (1:7)) {
    var2<-log10(PPAdata[,i+1])
    names(var2)<-rownames(PPAdata)
    var<-subset(var2, !is.na(var2)&!is.na(PC1))
    PC12<-subset(PC1, !is.na(PC1)&!is.na(var2))
    treetest<-drop.tip(treesyr, setdiff(treesyr$tip.label, names(var)))
    datatest<-as.data.frame(cbind(var,PC12))
    spp<-names(var)
    model=list()
    model[[1]]=gls(PC12~var, datatest,
                  correlation=corBrownian(phy=treetest, form=~spp), method="ML")
    model[[2]]=gls(PC12~var, datatest,
                  correlation=corMartins(fitEvolPar(datatest, treetest,"OU"),
                                         phy=treetest, fixed=TRUE,
                                         form=~spp), method="ML")
    model[[3]]=gls(PC12~var, datatest,
                  correlation=corPagel(fitEvolPar(datatest,treetest,"lambda"),
                                       phy=treetest,
                                       fixed=TRUE, form=~spp), method="ML")
    model[[4]]=gls(PC12~var, datatest,
                  correlation=corBlomberg(fitEvolPar(datatest,treetest,"EB"),
                                          phy=treetest, fixed=TRUE,
                                          form=~spp), method="ML")
    model[[5]]=gls(PC12~var, datatest, method="ML")
    print(colnames(PPAdata)[i+1])
    print(aictab(cand.set=model, modnames=Modnames,sort=TRUE))
}
## [1] 0
## [1] 1
## [1] 0.1
## [1] "Body_mass"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 358.13        Inf    NaN    NaN -175.26
## Lambda 3 358.13        Inf    NaN    NaN -175.26
## EB     3 352.60        Inf    NaN    NaN -172.50
## OLS    3 364.99        Inf    NaN    NaN -178.69
## OU     3   -Inf        NaN    NaN    NaN     Inf
## 
## [1] 0
## [1] 1
## [1] 0.1
## [1] "Distance_TL.labia"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 340.32        Inf    NaN    NaN -166.30
## Lambda 3 340.32        Inf    NaN    NaN -166.30
## EB     3 335.85        Inf    NaN    NaN -164.07
## OLS    3 347.14        Inf    NaN    NaN -169.71
## OU     3   -Inf        NaN    NaN    NaN     Inf
## 
## [1] 0
## [1] 1
## [1] 0.1
## [1] "TL_CSA"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 358.87        Inf    NaN    NaN -175.64
## Lambda 3 358.87        Inf    NaN    NaN -175.64
## EB     3 354.51        Inf    NaN    NaN -173.46
## OLS    3 366.41        Inf    NaN    NaN -179.41
## OU     3   -Inf        NaN    NaN    NaN     Inf
## 
## [1] 0
## [1] 1
## [1] 0.1
## [1] "Length_labia"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 358.99        Inf    NaN    NaN -175.69
## Lambda 3 358.99        Inf    NaN    NaN -175.69
## EB     3 354.75        Inf    NaN    NaN -173.58
## OLS    3 366.37        Inf    NaN    NaN -179.38
## OU     3   -Inf        NaN    NaN    NaN     Inf
## 
## [1] 0
## [1] 0
## [1] 0.1
## [1] "TracheaRatio"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 288.38        Inf    NaN    NaN -140.10
## Lambda 3 284.42        Inf    NaN    NaN -138.12
## EB     3 284.49        Inf    NaN    NaN -138.15
## OLS    3 284.42        Inf    NaN    NaN -138.12
## OU     3   -Inf        NaN    NaN    NaN     Inf
## 
## [1] 0
## [1] 0
## [1] 0.1
## [1] "Tracheal_diameter_EXT"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 345.16        Inf    NaN    NaN -168.72
## Lambda 3 341.85        Inf    NaN    NaN -167.07
## EB     3 338.91        Inf    NaN    NaN -165.60
## OLS    3 341.85        Inf    NaN    NaN -167.07
## OU     3   -Inf        NaN    NaN    NaN     Inf
## 
## [1] 0
## [1] 0
## [1] 0.1
## [1] "Tracheal_diameter_INT"
## 
## Model selection based on AICc:
## 
##        K   AICc Delta_AICc AICcWt Cum.Wt      LL
## BM     3 345.67        Inf    NaN    NaN -168.98
## Lambda 3 342.39        Inf    NaN    NaN -167.34
## EB     3 339.56        Inf    NaN    NaN -165.92
## OLS    3 342.39        Inf    NaN    NaN -167.34
## OU     3   -Inf        NaN    NaN    NaN     Inf

In most cases, the BM model is selected as the best fit – we will use it for PPA.

Due to the high number of variables (Hardenberg and Gonzales-Voyer, 2013; Gonzales-Voyer and Hardenberg, 2014), we will decompose the PPA into two separate analyses: one on the effect of body mass on syringeal variables, and one on the effect of syringeal variables on sound production (represented by PC1 from our pPCA).

We can remove one of the two tracheal diameters, since the ratio of the two remains fairly constant through our sample:

RatioTD<-(log10(PPAdata$Tracheal_diameter_INT)/log10(PPAdata$Tracheal_diameter_EXT))*100
RatioTD<-RatioTD[1:20]
min(RatioTD); max(RatioTD) # values between 95 and 99% when log-converted
## [1] 95.01686
## [1] 98.70606

We will be using the external diameter in subsequent analyses.

  • PPA with body mass and syringeal parameters
PPAdata[,c(2:8)]<-log10(PPAdata[,c(2:8)])
colnames(PPAdata)[2:8]<-c("BM","DTL","CSA","LL","Tr","TDe","TDi")

M<-define_model_set(null=c(),
                    one=c(LL~Tr),
                    two=c(LL~Tr, Tr~BM),
                    three=c(LL~TDe),
                    four=c(LL~TDe+Tr),
                    five=c(LL~TDe+Tr, Tr~BM),
                    six=c(LL~DTL),
                    seven=c(LL~DTL, DTL~CSA),
                    eight=c(LL~DTL, DTL~CSA, CSA~Tr),
                    nine=c(LL~DTL, DTL~CSA, CSA~Tr, Tr~BM),
                    ten=c(LL~DTL, DTL~CSA, CSA~TDe),
                    eleven=c(LL~DTL, DTL~CSA, CSA~TDe+Tr),
                    twelve=c(LL~DTL, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    thirteen=c(LL~DTL+Tr),
                    fourteen=c(LL~DTL+Tr, DTL~CSA),
                    fifteen=c(LL~DTL+Tr, DTL~CSA, CSA~Tr),
                    sixteen=c(LL~DTL+Tr, DTL~CSA, CSA~Tr, Tr~BM),
                    seventeen=c(LL~DTL+Tr, DTL~CSA, CSA~TDe),
                    eighteen=c(LL~DTL+Tr, DTL~CSA, CSA~TDe+Tr),
                    nineteen=c(LL~DTL+Tr, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    twenty=c(LL~DTL+Tr, Tr~BM),
                    twentytwo=c(LL~DTL+Tr, DTL~CSA, Tr~BM),
                    twentythree=c(LL~DTL+Tr, DTL~CSA, CSA~Tr, Tr~BM),
                    twentyfour=c(LL~DTL+Tr, DTL~CSA, CSA~Tr, Tr~BM),
                    twentyfive=c(LL~DTL+Tr, DTL~CSA, CSA~TDe, Tr~BM),
                    twentysix=c(LL~DTL+Tr, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    twentyseven=c(LL~DTL+Tr, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    twentyeight=c(LL~DTL+TDe),
                    twentynine=c(LL~DTL+TDe, DTL~CSA),
                    thirty=c(LL~DTL+TDe, DTL~CSA, CSA~Tr, Tr~BM),
                    thirtyone=c(LL~DTL+TDe, DTL~CSA, CSA~TDe),
                    thirtytwo=c(LL~DTL+TDe, DTL~CSA, CSA~TDe+Tr),
                    thirtythree=c(LL~DTL+TDe, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    thirtyfour=c(LL~DTL+Tr+TDe),
                    thirtyfive=c(LL~DTL+Tr+TDe, DTL~CSA),
                    thirtysix=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~Tr),
                    thirtyseven=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~Tr, Tr~BM),
                    thirtyeight=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~TDe),
                    thirtynine=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~TDe+Tr),
                    forty=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    fortyone=c(LL~DTL+Tr+TDe, Tr~BM),
                    fortytwo=c(LL~DTL+Tr+TDe, DTL~CSA, Tr~BM),
                    fortythree=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~Tr, Tr~BM),
                    fortyfour=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~TDe, Tr~BM),
                    fortyfive=c(LL~DTL+Tr+TDe, DTL~CSA, CSA~TDe+Tr, Tr~BM),
                    .common=c(TDi~TDe,CSA~BM,DTL~BM,LL~BM,TDe~BM,TDi~BM, LL~BM))

plot_model_set(M)

result<-phylo_path(M, data=PPAdata, tree=treesyr, model='BM', "logistic_IG10"); result
## 5 rows were dropped because they contained NA values.
## Pruned tree to drop species not included in dat.
## A phylogenetic path analysis, on the variables:
##  Continuous:  Tr BM LL DTL CSA TDe TDi 
##  Binary:       
## 
##  Evaluated for these models: null one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty twentytwo twentythree twentyfour twentyfive twentysix twentyseven twentyeight twentynine thirty thirtyone thirtytwo thirtythree thirtyfour thirtyfive thirtysix thirtyseven thirtyeight thirtynine forty fortyone fortytwo fortythree fortyfour fortyfive 
## 
##  Containing 492 phylogenetic regressions, of which 53 unique
s<-summary(result); s
## Warning in summary.phylopath(result): CICc was not calculated for causal models
## where the number of parameters is equal toor larger than the number of species.
plot(s)
## Warning: Removed 45 rows containing missing values (`position_stack()`).
## Warning: Removed 45 rows containing missing values (`geom_text()`).

Not enough data to compile CICc due to small sample size – we will focus on individual regressions.

Simple regressions using PGLS

  • Define single regression formulas between pairs of variables to perform PGLS
varpairs<-combn(c(colnames(PPAdata[2:7]),"IM_CSA","PC1"),2)
varpairs<-varpairs[,-c(11,14,23)] # Hypotheses we are not testing
varpair<-list()
for (i in c(1:7,9,11,12,15,16,20:25)) {
  varpair[[i]]<-paste(varpairs[2,i], varpairs[1,i], sep="~")
}
for (i in c(8,10,13,14,17:19)) {
  varpair[[i]]<-paste(varpairs[1,i], varpairs[2,i], sep="~")
}
  • Perform all PGLS
# Create new dataset to include intrinsic muscle CSA
IM_CSA<-log10(data$IM_CSA)
datanew<-cbind(PPAdata, IM_CSA)

# PGLS
modlist<-as.data.frame(matrix(NA, nrow=length(varpair), ncol=3))
colnames(modlist)<-c("Model","R2","p")
for (i in c(1:19,21:25)) {
  datapair<-cbind.data.frame(datanew$Taxon, datanew[,paste(varpairs[1,i])],
                                datanew[,paste(varpairs[2,i])])
  colnames(datapair)<-c("Taxon",varpairs[1,i],varpairs[2,i])
  comppair<-comparative.data(treesyr, datapair, names.col="Taxon")
  modpair<-pgls(as.formula(paste(varpair[[i]])), data=comppair, lambda="ML")
  modlist[i,1]<-paste(varpair[[i]])
  modlist[i,2]<-summary(modpair)$adj.r.squared
  modlist[i,3]<-summary(modpair)$coef[2,4]
}

# Model 20 (PC1~LL) cannot compute because of problem with 'optim' (ML estimate of lambda is probably out of bounds)
# We can get the value of lambda with 'fitEvolPar', compute the corresponding model, and add it to the data frame of models
data20<-cbind.data.frame(datanew[,paste(varpairs[1,20])],
                                datanew[,paste(varpairs[2,20])])
colnames(data20)<-c(varpairs[1,20],varpairs[2,20])
rownames(data20)<-rownames(PPAdata)
data20<-na.omit(data20)
tree20<-drop.tip(treesyr, setdiff(treesyr$tip.label, rownames(data20)))
fitEvolPar(data20, tree20, "lambda")
## [1] 1
# lambda = 1: we do not need an ML estimate for lambda

for (i in 20) {
  datapair<-cbind.data.frame(datanew$Taxon, datanew[,paste(varpairs[1,i])],
                                datanew[,paste(varpairs[2,i])])
  colnames(datapair)<-c("Taxon",varpairs[1,i],varpairs[2,i])
  comppair<-comparative.data(treesyr, datapair, names.col="Taxon")
  modpair<-pgls(as.formula(paste(varpair[[i]])), data=comppair)
  modlist[i,1]<-paste(varpair[[i]])
  modlist[i,2]<-summary(modpair)$adj.r.squared
  modlist[i,3]<-summary(modpair)$coef[2,4]
}

modlist # list of all models with pseudo R-squared and p-values

Let us check which models are significant

modlist[which(modlist$p<0.05),]
  • Check additional parameters of significant models
for (i in which(modlist$p<0.05)) {
  datapair<-cbind.data.frame(datanew$Taxon, datanew[,paste(varpairs[1,i])],
                                datanew[,paste(varpairs[2,i])])
  colnames(datapair)<-c("Taxon",varpairs[1,i],varpairs[2,i])
  treepair<-drop.tip(treesyr, setdiff(treesyr$tip.label, datapair$Taxon))
  comppair<-comparative.data(treepair, datapair, names.col="Taxon")
  modpair<-pgls(as.formula(paste(varpair[[i]])), data=comppair, lambda="ML")
  print(as.formula(paste(varpair[[i]])))
  print(summary(modpair)$coef)
  print(paste("R-squared =", summary(modpair)$adj.r.squared))
  print(shapiro.test(modpair$residuals))
  plot(modpair$residuals~modpair$fitted, main="Normalized Residuals v Fitted Values")
}
## CSA ~ BM
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 3.8809359  0.2579353 15.046163 5.214496e-12
## BM          0.7983534  0.1519696  5.253376 4.528547e-05
## [1] "R-squared = 0.570796661549721"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.89972, p-value = 0.03456

## LL ~ BM
##              Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 1.8421281  0.2776243 6.635327 2.392372e-06
## BM          0.3618102  0.1620229 2.233081 3.776862e-02
## [1] "R-squared = 0.16620292803003"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.91549, p-value = 0.07056

## TDe ~ BM
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 2.9380373 0.03027184 97.055114 0.000000e+00
## BM          0.2597343 0.03073778  8.450002 1.114473e-07
## [1] "R-squared = 0.787478059507324"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.95309, p-value = 0.4164

## IM_CSA ~ BM
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 5.5296253  0.1762632 31.371420 1.223466e-13
## BM          0.7066604  0.2148268  3.289442 5.865668e-03
## [1] "R-squared = 0.412269251188365"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.9591, p-value = 0.6767

## DTL ~ CSA
##              Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 1.9594961  0.4167421 4.701940 0.0001776936
## CSA         0.2658761  0.0855205 3.108917 0.0060599328
## [1] "R-squared = 0.313220671406667"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.97818, p-value = 0.9085

## DTL ~ Tr
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  4.749126  0.3929901 12.084594 8.546387e-09
## Tr          -0.840497  0.2238408 -3.754888 2.132676e-03
## [1] "R-squared = 0.466176663866154"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.96732, p-value = 0.7935

## IM_CSA ~ DTL
##               Estimate Std. Error     t value     Pr(>|t|)
## (Intercept) -0.1241414  1.4212382 -0.08734738 0.9318360597
## DTL          1.8648043  0.4250838  4.38691027 0.0008854953
## [1] "R-squared = 0.583933186384203"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.92867, p-value = 0.2923

## CSA ~ TDe
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -1.826354  1.2971701 -1.407953 1.761818e-01
## TDe          2.071955  0.4014746  5.160862 6.563061e-05
## [1] "R-squared = 0.574320293768139"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.93791, p-value = 0.2189

## IM_CSA ~ CSA
##              Estimate Std. Error  t value    Pr(>|t|)
## (Intercept) 2.8405865  0.9615453 2.954189 0.011181491
## CSA         0.6722607  0.1993087 3.372962 0.004995249
## [1] "R-squared = 0.425685221282917"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.97592, p-value = 0.934

## LL ~ Tr
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  4.931236  0.7850172  6.281692 1.470521e-05
## Tr          -1.412626  0.4479796 -3.153327 6.561557e-03
## [1] "R-squared = 0.358549580541089"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.95051, p-value = 0.4647

## LL ~ IM_CSA
##              Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 0.8336393 0.37938540 2.197342 0.0467250578
## IM_CSA      0.2828725 0.06017951 4.700479 0.0004148586
## [1] "R-squared = 0.601077116895975"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.89641, p-value = 0.08389

## IM_CSA ~ TDe
##               Estimate Std. Error    t value    Pr(>|t|)
## (Intercept) -0.5083212  1.6185512 -0.3140594 0.758868645
## TDe          2.0997796  0.5135103  4.0890702 0.001501441
## [1] "R-squared = 0.547361569552698"
## 
##  Shapiro-Wilk normality test
## 
## data:  modpair$residuals
## W = 0.97793, p-value = 0.9611

Tracheal length ratio is negatively correlated with distance TL-labia, which is positively correlated with CSA of intrinsic muscles. In hummingbirds, shortening of the trachea may have shortened TL and liberated extra space for intrinsic muscles to develop at the level of the tympanum. Additionally, both shortening of the trachea and higher CSA of intrinsic muscles are correlated with longer labia, which may have an effect on hummingbird vocal production. However, only one of our investigated traits, shortening of the trachea, is correlated with PC1 from our pPCA on sound characters. It is possible that the effect of tracheal length ratio on CSA of intrinsic muscles has an effect on sound production undetected by our analyses due to small sample size; this hypothesis, however, requires further testing.

We can test this hypothesis by including individual acoustic traits and correlating them with morphological traits.

  • Include acoustic traits to the main dataset
dataS<-cbind(datanew[,c(1:7,10)],log10(data[,c(13:17)]))
  • Test effect of each morphoanatomical trait on each acoustic trait (individual PGLS regressions)
# List pairs of variables (x, y) to compile regressions of
pairsound<-expand.grid(colnames(dataS)[2:8],colnames(dataS)[9:13])

# Perform all PGLS
modlistsound<-as.data.frame(matrix(NA, nrow=35, ncol=3))
colnames(modlistsound)<-c("Model","R2","p")
for (i in c(1:10,12:35)) {
  datapgls<-cbind.data.frame(dataS$Taxon, dataS[,paste(pairsound[i,1])],
                                dataS[,paste(pairsound[i,2])])
  colnames(datapgls)<-c("Taxon",paste(pairsound[i,1]),paste(pairsound[i,2]))
  compS<-comparative.data(treesyr, datapgls, names.col="Taxon")
  pglS<-pgls(as.formula(paste(colnames(datapgls[3]),"~",colnames(datapgls[2]))),
             data=compS, lambda="ML")
  modlistsound[i,1]<-as.character(paste(colnames(datapgls[3]),"~",colnames(datapgls[2])))
  modlistsound[i,2]<-summary(pglS)$adj.r.squared
  modlistsound[i,3]<-summary(pglS)$coef[2,4]
}

# Model 20 (freq_max_amp~LL) cannot compute because of problem with 'optim' (ML estimate of lambda is probably out of bounds)
# We can get the value of lambda with 'fitEvolPar', compute the corresponding model, and add it to the data frame of models
i = 11
datapgls<-cbind.data.frame(dataS[,paste(pairsound[i,1])],
                           dataS[,paste(pairsound[i,2])])
colnames(datapgls)<-c(paste(pairsound[i,1]),paste(pairsound[i,2]))
rownames(datapgls)<-dataS$Taxon
datapgls<-na.omit(datapgls)
tree11<-drop.tip(treesyr, setdiff(treesyr$tip.label, rownames(datapgls)))
fitEvolPar(datapgls, tree11, "lambda")
## [1] 1
# lambda = 1: we do not need an ML estimate for lambda
datapgls<-cbind.data.frame(dataS$Taxon,dataS[,paste(pairsound[i,1])],
                           dataS[,paste(pairsound[i,2])])
colnames(datapgls)<-c("Taxon", paste(pairsound[i,1]),paste(pairsound[i,2]))
compS<-comparative.data(treesyr, datapgls, names.col="Taxon")
pglS<-pgls(as.formula(paste(colnames(datapgls[3]),"~",colnames(datapgls[2]))),
             data=compS)
modlistsound[i,1]<-as.character(paste(colnames(datapgls[3]),"~",colnames(datapgls[2])))
modlistsound[i,2]<-summary(pglS)$adj.r.squared
modlistsound[i,3]<-summary(pglS)$coef[2,4]

# Check for significance
min(modlistsound$R2); max(modlistsound$R2)
## [1] -0.08708794
## [1] 0.1983234
which(modlistsound$p<0.05)
## integer(0)

None of the regressions are significant, probably due to small sample size.

Compare with previous analysis by C. Eliason using phylogenetic 2B-PLS

  • Extract data and perform the analysis
# Data matrices
dataS2<-dataS[,c(2:7,9:13)] # No IM CSA due to a high amount of missing data (all non-hummingbirds)
dataS2<-na.omit(dataS2)
Morphol<-dataS2[,1:6]
Sound<-dataS2[,7:11]
treepls<-drop.tip(treesyr, setdiff(treesyr$tip.label, rownames(dataS2)))

# 2B-PLS
twobpls<-phylo.integration(log1p(Morphol), log1p(Sound), phy=treepls, iter=9999) # with phylogeny
summary(twobpls); plot(twobpls)

twobpls2<-two.b.pls(log1p(Morphol), log1p(Sound), iter=9999) # without phylogeny
summary(twobpls2); plot(twobpls2)

  • Same with pPCA
# Perform pPCAs
PCAmorph<-phyl.pca(treepls, Morphol, method="lambda", opt="REML")
PCAsound<-phyl.pca(treepls, Sound, method="lambda", opt="REML")
summary(PCAmorph); summary(PCAsound)
## Importance of components:
##                               PC1        PC2        PC3        PC4        PC5
## Standard deviation     0.07129062 0.04126277 0.02571364 0.01567533 0.01033804
## Proportion of Variance 0.64650565 0.21658303 0.08410751 0.03125653 0.01359514
## Cumulative Proportion  0.64650565 0.86308868 0.94719619 0.97845272 0.99204786
##                                PC6
## Standard deviation     0.007906572
## Proportion of Variance 0.007952140
## Cumulative Proportion  1.000000000
## Importance of components:
##                               PC1        PC2        PC3         PC4
## Standard deviation     0.03562838 0.02428517 0.01243139 0.003045657
## Proportion of Variance 0.62695850 0.29129216 0.07632838 0.004581508
## Cumulative Proportion  0.62695850 0.91825066 0.99457904 0.999160548
##                                 PC5
## Standard deviation     0.0013036906
## Proportion of Variance 0.0008394516
## Cumulative Proportion  1.0000000000
PC1morph<-PCAmorph$S[,1]
PC1sound<-PCAsound$S[,1]

# PGLS regression between the two PC1s
dataPC1<-cbind.data.frame(names(PC1morph), PC1morph, PC1sound); colnames(dataPC1)[1]<-"Taxon"
dataPCAreg<-comparative.data(treepls, dataPC1, names.col="Taxon")
PCAreg<-pgls(PC1sound~PC1morph, dataPCAreg, lambda="ML")
summary(PCAreg)
## 
## Call:
## pgls(formula = PC1sound ~ PC1morph, data = dataPCAreg, lambda = "ML")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.067543 -0.002066  0.007062  0.014598  0.048378 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 0.000
##    lower bound : 0.000, p = 1    
##    upper bound : 1.000, p = 0.036173
##    95.0% CI   : (NA, 0.964)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.037304   0.077342  0.4823   0.6383
## PC1morph    -0.237119   0.173574 -1.3661   0.1970
## 
## Residual standard error: 0.0345 on 12 degrees of freedom
## Multiple R-squared: 0.1346,  Adjusted R-squared: 0.06247 
## F-statistic: 1.866 on 1 and 12 DF,  p-value: 0.197

Results are similar to those obtained with our own dataset: the regressions are not significant, probably due to small sample size.

Ancestral state reconstructions

Since phylogenetic signal (l. 140) was high and significant for all traits except distance TL-labia and IM CSA, we can optimize five out of seven morphoanatomical traits using contMap.

We had not tested for tracheal elongation ratio, so we need to perform this first.

# Estimating lambda
Tr<-dataS$Tr; names(Tr)<-rownames(dataS)
Tr<-Tr[!is.na(Tr)]
treeTr<-drop.tip(treesyr, setdiff(treesyr$tip.label, names(Tr)))
phylosig(treeTr, Tr, method="lambda", test=TRUE)
## 
## Phylogenetic signal lambda : 0.901349 
## logL(lambda) : 18.4337 
## LR(lambda=0) : 12.2014 
## P-value (based on LR test) : 0.000477539
# Evolutionary model selection
for (m in 1:length(models)) {
  fit[[m]]=fitContinuous(treeTr, Tr, model=models[m], ncores=2)
}
## Warning in fitContinuous(treeTr, Tr, model = models[m], ncores = 2): 
## Parameter estimates appear at bounds:
##  a
modTr<-modSel.geiger(fit[[1]],fit[[2]],fit[[3]],fit[[4]],fit[[5]],fit[[6]])
modTr

Phylogenetic signal is high and significant for tracheal elongation ratio. A Brownian Motion model is the best fit.

  • Ancestral reconstructions
dataplot=list(); fit=list(); obj=list()
for (i in c(2,4:7)) {
  dataplot[[i]]<-dataS[,i]; names(dataplot[[i]])<-rownames(dataS)
  dataplot[[i]]<-na.omit(dataplot[[i]])
  treeplot<-drop.tip(treesyr, setdiff(treesyr$tip.label, names(dataplot[[i]])))
  fit[[i]]<-fastAnc(treeplot, dataplot[[i]], vars=TRUE, CI=TRUE)
  obj[[i]]<-setMap(contMap(treeplot, dataplot[[i]]),
                   colors=rev(brewer.pal(10,"Spectral")))
}
  • Plot for all traits (all generated at once)
for (i in c(2,4:7)) {
  plot(obj[[i]])
  title(paste('Ancestral state reconstruction for', colnames(dataS)[i]))
}

  • Same for ratios of traits to body mass to correct for influence of body mass.
dataplot=list(); fit=list(); obj=list()
for (i in c(4:7)) {
  dataplot[[i]]<-dataS[,i]/dataS$BM; names(dataplot[[i]])<-rownames(dataS)
  dataplot[[i]]<-na.omit(dataplot[[i]])
  treeplot<-drop.tip(treesyr, setdiff(treesyr$tip.label, names(dataplot[[i]])))
  fit[[i]]<-fastAnc(treeplot, dataplot[[i]], vars=TRUE, CI=TRUE)
  obj[[i]]<-setMap(contMap(treeplot, dataplot[[i]]),
                   colors=rev(brewer.pal(10,"Spectral")))
}
  • Plot for all traits (all generated at once)
for (i in c(4:7)) {
  plot(obj[[i]])
  title(paste('Ancestral state reconstruction for', colnames(dataS)[i], 'corrected for body mass'))
}

Box plots and t-tests for labia length (to showcase lower values for Phaethornis)

# With labia length
ggplot(datasyr, aes(x=Family, y=log(Length_labia), fill=Family)) +
  geom_boxplot() +
  geom_jitter(shape=16, position=position_jitter(0.2)) +
  scale_fill_brewer(palette="Dark2")

# With ratio (labia length/body mass)
ggplot(datasyr, aes(x=Family, y=log(Length_labia/Body_mass), fill=Family)) +
  geom_boxplot() +
  geom_jitter(shape=16, position=position_jitter(0.2)) +
  scale_fill_brewer(palette="Dark2")

  • T-test for difference between swifts and hummingbirds in labia length
dataswifts<-datasyr %>% filter(Family=="Apodidae")
datahummingbirds<-datasyr %>% filter(Family=="Trochilidae")

## With labia length
t.test(log(datahummingbirds$Length_labia), log(dataswifts$Length_labia)) # Marginally significant
## 
##  Welch Two Sample t-test
## 
## data:  log(datahummingbirds$Length_labia) and log(dataswifts$Length_labia)
## t = 2.5195, df = 6.0742, p-value = 0.04485
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.01949767 1.21182602
## sample estimates:
## mean of x mean of y 
##  5.761000  5.145339
## With ratio (labia length/body mass)
t.test(log(datahummingbirds$Length_labia/datahummingbirds$Body_mass),
       log(dataswifts$Length_labia/dataswifts$Body_mass)) # Significant
## 
##  Welch Two Sample t-test
## 
## data:  log(datahummingbirds$Length_labia/datahummingbirds$Body_mass) and log(dataswifts$Length_labia/dataswifts$Body_mass)
## t = 6.3714, df = 6.0045, p-value = 0.0007001
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.062936 2.388021
## sample estimates:
## mean of x mean of y 
##  4.020682  2.295204

Box plots and t-tests for tracheal elongation ratio (TER)

familyTER<-c("Caprimulgidae", rep("Apodidae", 3), rep("Trochilidae", 13))
TracheaRatioclean<-na.omit(TracheaRatio)
BMTER<-data$Body_mass[!is.na(data$Tracheal_length)]
TER<-as.data.frame(cbind(familyTER, TracheaRatioclean, BMTER))
colnames(TER)<-c("Family","TER", "Body_mass")
TER[,2]<-as.numeric(TER[,2]); TER[,3]<-as.numeric(TER[,3])

# With TER
ggplot(TER, aes(x=Family, y=log(TER), fill=Family)) +
  geom_boxplot() +
  geom_jitter(shape=16, position=position_jitter(0.2)) +
  scale_fill_brewer(palette="Dark2")

# With ratio (TER/body mass)
ggplot(TER, aes(x=Family, y=log(TER/Body_mass), fill=Family)) +
  geom_boxplot() +
  geom_jitter(shape=16, position=position_jitter(0.2)) +
  scale_fill_brewer(palette="Dark2")

  • T-test for difference between swifts and hummingbirds in TER
TERswifts<-TER %>% filter(Family=="Apodidae")
TERhummingbirds<-TER %>% filter(Family=="Trochilidae")

## With labia length
t.test(log(TERswifts$TER), log(TERhummingbirds$TER)) # Significant
## 
##  Welch Two Sample t-test
## 
## data:  log(TERswifts$TER) and log(TERhummingbirds$TER)
## t = 11.473, df = 13.393, p-value = 2.634e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4528126 0.6621264
## sample estimates:
## mean of x mean of y 
##  4.452572  3.895102
## With ratio (labia length/body mass)
t.test(log(TERhummingbirds$TER/TERhummingbirds$Body_mass), log(TERswifts$TER/TERswifts$Body_mass)) # Not significant
## 
##  Welch Two Sample t-test
## 
## data:  log(TERhummingbirds$TER/TERhummingbirds$Body_mass) and log(TERswifts$TER/TERswifts$Body_mass)
## t = 0.93656, df = 3.1209, p-value = 0.4156
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9988831  1.8582229
## sample estimates:
## mean of x mean of y 
##  2.135811  1.706141

References

Gonzalez-Voyer, A. Hardenberg, A. von. 2014. An introduction to phylogenetic path analysis. In: Garamszegi, L. Z. (Ed.). Modern phylogenetic comparative methods and their application in evolutionary biology: concepts and practices. Berlin: Springer, 201–229. https://doi.org/10.1007/978-3-662-43550-2_8

Hardenberg, A. von, Gonzalez‐Voyer, A. 2013. Disentangling evolutionary cause-effect relationships with phylogenetic confirmatory path analysis. Evolution 67, 378–387. https://doi.org/10.1111/j.1558-5646.2012.01790.x

Revell, L.J. 2009. Size-correction and principal components for interspecific comparative studies. Evolution 63, 3258–3268. https://doi.org/10.1111/j.1558-5646.2009.00804.x

Uyeda, J.C., Caetano, D.S., Pennell, M.W. 2015. Comparative analysis of principal components can be misleading. Systematic Biology 64, 677–689. https://doi.org/10.1093/sysbio/syv019